Representation of Texts into String Vectors for Text Categorization

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Representation of Texts into String Vectors for Text Categorization

In this study, we propose a method for encoding documents into string vectors, instead of numerical vectors. A traditional approach to text categorization usually requires encoding documents into numerical vectors. The usual method of encoding documents therefore causes two main problems: huge dimensionality and sparse distribution. In this study, we modify or create machine learning-based appr...

متن کامل

Encoding Words into String Vectors for Word Categorization

In this research, we propose the string vector based K Nearest Neighbor as the approach to the word categorization. In the previous works on the text categorization, it was successful to encode texts into string vectors, by preventing the demerits from encoding them into numerical vectors; it provides the motivation for doing this research. In this research, we encode words into string vectors ...

متن کامل

Text Representation for Automatic Text Categorization

Automatic Text Categorization (ATC), the automatic assignment of text documents to predefined classes, is a language engineering task very relevant to a number of applications, including automatic content and knowledge management in corporations and the Internet, information access and filtering, etc. With first works dating back to 60’s [14], and increased work in the last decade (see the surv...

متن کامل

AutoPCS: A Phrase-Based Text Categorization System for Similar Texts

Nearly all text classification methods classify texts into predefined categories according to the terms appeared in texts. State-of-the-art of text classification prefer to simplely take a word as a term since it performs good on some famous datasets; some experts even pointed out that phrases don’t improve or improve only marginally the classifiction accuracy. However, we found out that this i...

متن کامل

Inverted Index based Modified Version of KNN for Text Categorization

This research proposes a new strategy where documents are encoded into string vectors and modified version of KNN to be adaptable to string vectors for text categorization. Traditionally, when KNN are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Computing Science and Engineering

سال: 2010

ISSN: 1976-4677

DOI: 10.5626/jcse.2010.4.2.110